Introduction

The data set referenced in this script is generated from the American Community Survey (ACS) and the Washington Office of Superintendent of Public Instruction (OSPI). These data sets provide data at the person-level, with the ability to look at the different indicators by the six equity demographic groups of interest.

Access data

PUMS and OSPI data (Elmer)

This data set was compiled from PUMS data.

Looking at the fields in the data set
## [1] "Disability_cat" "Income_cat"     "LEP_cat"        "Older_cat"     
## [5] "POC_cat"        "Youth_cat"      "Total"
##  [1] "educational_attainment"  "healthcare_coverage"    
##  [3] "median_household_income" "household_poverty"      
##  [5] "housing_cost_burden"     "median_gross_rent"      
##  [7] "crowding"                "SNAP"                   
##  [9] "internet_access"         "Kindergarten readiness"
##  [1] 2021 2022 2019 2018 2017 2016 2015 2014 2013 2012 2011

Kindergarten Readiness

1. Explore data

In this section we make sure that the data set makes sense.

Data fields

consistent base data

  • There should be 5 geographies - the 4 counties and the Region.
  • There should be 6 equity focus categories - POC, income, disability, youth, older adult, and LEP
    • 2 sub-categories per focus group (e.g. people of color, non-people of color)
## [1] "King"      "Kitsap"    "Pierce"    "Snohomish" "Region"
## [1] "Income_cat"     "LEP_cat"        "Disability_cat" "POC_cat"



indicator-specific data

These fields will vary by indicator:

  • Type of metric - this will determine how the data are visualized (est =“percent” or “currency” or “number”)
  • Number of years (5-year span) - this can vary depending on data availability
  • Number of indicator-specific categories - this can vary depending on the indicator of interest, ranging from N/A (median income) to multiple levels (crowding, housing cost burden)
## [1] "share"
##  [1] 2022 2021 2019 2018 2017 2016 2015 2014 2013 2012 2011
## [1] "6 for 6 dimensions"

There are 5 geographies and 4 equity focus groups (each with 2 subgroups). There are 11 years in the data set and the indicator specific field has 1 attribute(s), which means there should be a total of 440 rows.

## [1] 1260

There are some missing data.

checking for missing data

Year / geography

If we look at the data by year and geography, there should be 8 entries per year/geography.

##       
##        King Kitsap Pierce Region Snohomish
##   2011   24      6     24     24        18
##   2012   24      6     24     24        24
##   2013   24     15     24     24        24
##   2014   24     15     24     24        24
##   2015   24     24     24     24        24
##   2016   24     24     24     24        24
##   2017   24     24     24     24        24
##   2018   24     24     24     24        24
##   2019   24     24     24     24        24
##   2021   24     24     24     24        24
##   2022   24     24     24     24        24

Kitsap (2011-2014), Snohomish (2011), 2020

Year by equity focus group

If we look at the data by year and focus group, there should be 10 entries per year/focus group.

##       
##        Disability_cat Income_cat LEP_cat POC_cat
##   2011             21         24      27      24
##   2012             24         27      27      24
##   2013             24         30      27      30
##   2014             24         30      27      30
##   2015             30         30      30      30
##   2016             30         30      30      30
##   2017             30         30      30      30
##   2018             30         30      30      30
##   2019             30         30      30      30
##   2021             30         30      30      30
##   2022             30         30      30      30



Year by equity focus sub-group

If we look at the data by year and focus sub-group, there should be 5 entries per year/focus sub-group.

##       
##        English proficient Limited English proficiency Low Income Non-Low Income
##   2011                 15                          12         12             12
##   2012                 15                          12         12             15
##   2013                 15                          12         15             15
##   2014                 15                          12         15             15
##   2015                 15                          15         15             15
##   2016                 15                          15         15             15
##   2017                 15                          15         15             15
##   2018                 15                          15         15             15
##   2019                 15                          15         15             15
##   2021                 15                          15         15             15
##   2022                 15                          15         15             15
##       
##        Non-POC POC With disability Without disability
##   2011      12  12               9                 12
##   2012      12  12              12                 12
##   2013      15  15              12                 12
##   2014      15  15              12                 12
##   2015      15  15              15                 15
##   2016      15  15              15                 15
##   2017      15  15              15                 15
##   2018      15  15              15                 15
##   2019      15  15              15                 15
##   2021      15  15              15                 15
##   2022      15  15              15                 15



Year by indicator attribute

If we look at the data by year and indicator attribute, there should be 40 entries per year/indicator attribute.

##       
##        6 for 6 dimensions
##   2011                 96
##   2012                102
##   2013                111
##   2014                111
##   2015                120
##   2016                120
##   2017                120
##   2018                120
##   2019                120
##   2021                120
##   2022                120



Numeric data

To check for 0s and NULLs

There are no nulls.

To look at distribution of all data - not the most useful visual, but provides a sense of the range of values at a high level in one plot.


This table includes a lot of information about the data set and helps to show the different levels of each field. It provides another way to check if data are available for all counties and all years, or where there may be gaps in the data set.
Removing the ‘0’ data point (Kitsap, 2022, LEP)


Data labels, shares

These charts were generated to ensure the labels across years are consistent/make sense. There had been an issue with misassigned labels because tidycensus::pums_variables, i.e. the only digital data dictionary available to associate labels with codes, exists only from 2017 forward. Most variables have had consistent codes, but in cases where the codes have shifted over time, using the 2017 lookup winds up mischaracterizing categories.

These charts also help to confirm that the shares add up to 100% - only relevant when indicator_attribute has more than one category. The indicator_attribute for median household income is NA.

The colors of the charts may not be consistent between the years depending on missing data.

2. Visually explore data

2a. Scatter plots

In this section we start to explore the data visually - distribution by the different dimensions within the data set. These plots are helpful to check for outliers and get a higher level understanding of the data in one visual, before slicing the data by geography and equity focus group in the following sections.

The following code will need to be adjusted to fit the fields specific to the data indicator. For educational attainment, we focus on those with a Bachelor’s degree or higher. The following code establishes the data frame that the rest of the analysis uses. If there are fewer than 2 indicator attributes, this section can be skipped/commented out, but the code will need to be adjusted throughout.

By indicator_attribute

This section isn’t relevant for this specific indicator because there aren’t unique indicator attributes.

By Year


2b. Facets by geography

In this section we explore trends by different groups with MOEs. These charts help to show any missing data by geography, year, or focus group/subgroup.



3. Developing visuals

In this section we further develop the draft visuals for communicating the results and supporting the narrative for the Equity Tracker webpages. These charts are slightly more refined by slicing the data by geography and equity focus group. The line charts don’t include MOEs, but they help make connections between the same groups over time.

Line charts by geography

High / low vulnerability groups



First/last years



calculated difference b/t

The 5 geographies are all included in the facets by geography, but they could be separated out to create 5 individual charts - one for each geography.

Line charts by equity group

High / low vulnerability groups



First/last years



calculated difference b/t

The 6 equity focus groups are all included in the facets by geography, but they could be separated out to create 6 individual charts - one for each focus group.


Cleveland dot plot

Resource for visual
The code to make this is type of visual is long - adjust to indicator as needed (scale_x_continuous, labs, label, etc).



4. Save files

This section needs to be edited. Keep the code chunks commented out for now as we draft and refine the visuals.

PNG

HTML

Copy files from Y drive > website folder



5. Archive

This section includes visuals that were determined to be less useful. We didn’t want to lose the work, but didn’t want to include it in the main workflow. Feel free to comment out if you don’t want to adjust the arguments to fit the indicator of interest.

Line chart: all categories

Line chart: by vulnerability



3 visuals

1. Map of most recent data

2. Facet chart of most recent

There are five charts for the different geographies: Region and the 4 counties.

3. Time series

Line chart

By geography

There are 5 charts for the different geographies: Region and the 4 counties.

All years

First/last years

There are 5 charts for the different geographies: Region and the 4 counties.

By equity group

There are 6 charts for the different equity groups: POC, low-income, etc.

All years
First/last years

There are 6 charts for the different equity groups: POC, low-income, etc.

Cleveland dot plot



Back to top of the page